Highly multiplexed imaging has enabled the simultaneous spatial profiling of dozens of biological molecules in tissues at single-cell resolution. Extracting biologically relevant information such as the spatial distribution of cell phenotypes from multiplexed tissue imaging data involves a number of computational tasks, including image segmentation, feature extraction, and spatially-resolved single-cell analysis. Here, we present an end-to-end workflow for multiplexed tissue image processing and analysis, integrating a number of previously developed computational tools to enable these tasks in a user-friendly and customizable fashion. For data quality assessment, we highlight the utility of napari-imc for interactively inspecting raw imaging data and the cytomapper R/Bioconductor package for image visualization in R. Raw data preprocessing, image segmentation and feature extraction are performed using the steinbock toolkit. We showcase two alternative approaches for segmenting cells based on supervised pixel classification and pre-trained deep learning models. The extracted single-cell data is then read, processed and analyzed in R. The protocol describes the use of common data containers, facilitating the application of a number of R/Bioconductor packages for dimensionality reduction, single-cell visualization and phenotyping. We provide instructions to perform spatially-resolved single-cell analysis including community analysis, cellular neighborhood detection and cell-cell interaction testing using the imcRtools R/Bioconductor package. Overall, we provide the protocol for researchers with little bioinformatics training, and data analysis can be completed within 5-6 hours, depending on the segmentation approach. An extended version of the workflow can be accessed at https://bodenmillergroup.github.io/IMCDataAnalysis/.
options(timeout = 10000)
dir.create("data/steinbock/raw", recursive = TRUE)
download.file("https://zenodo.org/record/7624451/files/panel.csv",
"data/steinbock/panel.csv")
download.file("https://zenodo.org/record/7575859/files/Patient1.zip",
"data/steinbock/raw/Patient1.zip")
download.file("https://zenodo.org/record/7575859/files/Patient2.zip",
"data/steinbock/raw/Patient2.zip")
download.file("https://zenodo.org/record/7575859/files/Patient3.zip",
"data/steinbock/raw/Patient3.zip")
download.file("https://zenodo.org/record/7575859/files/Patient4.zip",
"data/steinbock/raw/Patient4.zip")
download.file("https://zenodo.org/record/7575859/files/compensation.zip",
"data/compensation.zip")
unzip("data/compensation.zip", exdir="data", overwrite=TRUE)
unlink("data/compensation.zip")
download.file("https://zenodo.org/record/7575859/files/sample_metadata.csv",
destfile = "data/sample_metadata.csv")
download.file("https://zenodo.org/record/7432486/files/gated_cells.zip",
"data/gated_cells.zip")
unzip("data/gated_cells.zip", exdir="data", overwrite=TRUE)
unlink("data/gated_cells.zip")
TODO
alias steinbock="docker run -v /path/to/IMCDataAnalysis/data/steinbock:/data -u $(id -u):$(id -g) ghcr.io/bodenmillergroup/steinbock:0.16.0"
In this protocol, data pre-processing refers to the extraction of multi-channel images from raw imaging data, and to preparing them for downstream processing. The required steps are dependent on the imaging technology; here, we showcase the pre-processing of raw IMC data which includes a hot pixel filtering step.
steinbock preprocess imc images --hpf 50
The step took TODO minutes.
Perform automatic deep learning-enabled single-cell segmentation using the pre-trained Mesmer neural network implemented in DeepCell. In the following command, channels will be min-max-normalized and mean-aggregated according to the deepcell column in the panel file.
steinbock segment deepcell --minmax
The step took TODO minutes.
For each image, extract the mean pixel intensity per cell and marker. The resulting cell-level intensity values are stored as separate CSV files (one file per image):
steinbock measure intensities
The step took TODO minutes.
For each image, extract standard morphological features (e.g., area, eccentricity) per cell. The resulting cell-level features are stored as separate CSV files (one file per image):
steinbock measure regionprops
The step took TODO minutes.
In each image, detect cells in close spatial proximity. The resulting spatial cell graphs are stored as separate directed edge lists in CSV format (one file per image):
steinbock measure neighbors --type expansion --dmax 4
The step took TODO minutes.
Read in the spatially-resolved single-cell data into R using the imcRtools package. For the rest of the protocol we will continue with the steinbock generated data.
library(imcRtools)
spe <- read_steinbock("data/steinbock/")
The step took 0.52 minutes.
After reading in the single-cell data, the SpatialExperiment object needs to be further processed.
First, the column names are set based on the image name and the cell identifier.
The patient identifier and the region of interest (ROI) identifier are saved in the object as well as the cancer type, which can be read in from the provided data/sample_metadata.csv file.
For easy access later on, the channels containing biological variation are selected.
Finally, the mean pixel intensities per channel and cell are arsinh-transformed using a cofactor of 1.
library(tidyverse)
colnames(spe) <- paste0(spe$sample_id, "_", spe$ObjectNumber)
# Read patient metadata
meta <- read_csv("data/sample_metadata.csv")
# Extract patient id and ROI id from sample name
spe$patient_id <- str_extract(spe$sample_id, "Patient[1-4]")
spe$ROI <- str_extract(spe$sample_id, "00[1-8]")
# Store cancer type in SPE object
spe$indication <- meta$Indication[match(spe$patient_id, meta$`Sample ID`)]
# Specify which channels to use for downstream analysis
rowData(spe)$use_channel <- !grepl("DNA|Histone", rownames(spe))
# Transform the counts
assay(spe, "exprs") <- asinh(counts(spe)/1)
The step took 0.04 minutes.
Read in multi-channel images as a CytoImageList container using the cytomapper package.
library(cytomapper)
images <- loadImages("data/steinbock/img/")
channelNames(images) <- rownames(spe)
The step took 0.24 minutes.
Read in segmentation masks as a CytoImageList container.
masks <- loadImages("data/steinbock/masks/", as.is = TRUE)
## All files in the provided location will be read in.
The step took 0 minutes.
For downstream visualization and analysis tasks, additional metadata needs to be added to the CytoImageList objects storing the multi-channel images and segmentation masks. Here, individual images, segmentation masks and entries in the SpatialExperiment object are matched via the sample_id entry.
# Extract patient id from image name
patient_id <- str_extract(names(images), "Patient[1-4]")
# Retrieve cancer type per patient from metadata file
indication <- meta$Indication[match(patient_id, meta$`Sample ID`)]
# Store patient and image level information in elementMetadata
mcols(images) <- mcols(masks) <- DataFrame(sample_id = names(images),
patient_id = patient_id,
indication = indication)
The step took 0 minutes.
Low signal spillover between neighbouring channels occurs when using technologies that rely on mass cytometry. Spillover is defined as a small proportion of the signal of a neighbouring channel which can be detected in the primary channel. As spillover is linear to the signal of the neighbouring channel it can be correct for using a previously described compensation approach. This phenomenon is IMC specific and the steps of the following section can be skipped when working with data generated by other multiplexed imaging technologies.
Read in data from the spillover slide for channel-to-channel spillover correction. The experimental procedure to create and acquire a spillover slide can be seen in Supplementary Note 2. As recommended by the CATALYST R/Bioconductor package, the pixel intensities are arsinh-transformed using a cofactor of 5.
sce <- readSCEfromTXT("data/compensation/")
## Spotted channels: Y89, In113, In115, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176
## Acquired channels: Ar80, Y89, In113, In115, Xe131, Xe134, Ba136, La138, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176, Ir191, Ir193, Pt196, Pb206
## Channels spotted but not acquired:
## Channels acquired but not spotted: Ar80, Xe131, Xe134, Ba136, La138, Ir191, Ir193, Pt196, Pb206
assay(sce, "exprs") <- asinh(counts(sce)/5)
The step took 0.05 minutes.
Perform quality assessment of the spillover data by visualizing the median pixel intensity per channel and spotted metal.
plotSpotHeatmap(sce)
(optional) Perform pixel binning to increase median pixel intensity. This is only needed if pixel intensities are too low (median below ~200 counts).
sce2 <- binAcrossPixels(sce, bin_size = 10)
The step took 0.24 minutes.
Filter incorrectly assigned pixels. The following step uses functions provided by the CATALYST package to “de-barcode” the pixels. Based on the intensity distribution of all channels, pixels are assigned to their corresponding barcode; here, this is the already known metal spot. This procedure identifies pixels that cannot be robustly assigned to the spotted metal. Pixels of such kind can be regarded as “noisy”, “background”, or “artifacts” that should be removed prior to spillover estimation. The spotted channels (bc_key) need to be specified. The general workflow for pixel de-barcoding is as follows:
In cases where incorrect assignments occurred or where few pixels were measured for some spots, the imcRtools package exports a helper function to exclude pixels.
library(CATALYST)
bc_key <- as.numeric(unique(sce$sample_mass))
bc_key <- bc_key[order(bc_key)]
sce <- assignPrelim(sce, bc_key = bc_key)
## Debarcoding data...
## o ordering
## o classifying events
## Normalizing...
## Computing deltas...
sce <- estCutoffs(sce)
sce <- applyCutoffs(sce)
The step took 0.13 minutes.
Assess incorretly assigned pixels:
library(pheatmap)
cur_table <- table(sce$bc_id, sce$sample_mass)
# Visualize the correctly and incorrectly assigned pixels
pheatmap(log10(cur_table + 1),
cluster_rows = FALSE,
cluster_cols = FALSE)
# Compute the fraction of unassigned pixels per spot
cur_table["0",] / colSums(cur_table)
## 113 115 141 142 143 144 145 146 147 148 149
## 0.1985 0.1060 0.2575 0.3195 0.3190 0.3825 0.3545 0.4280 0.3570 0.4770 0.4200
## 150 151 152 153 154 155 156 158 159 160 161
## 0.4120 0.4025 0.4050 0.4630 0.4190 0.4610 0.3525 0.4020 0.4655 0.4250 0.5595
## 162 163 164 165 166 167 168 169 170 171 172
## 0.4340 0.4230 0.4390 0.4055 0.5210 0.3900 0.3285 0.3680 0.5015 0.4900 0.5650
## 173 174 175 176 89
## 0.3125 0.4605 0.4710 0.2845 0.3015
# Filter pixels that were assigned to a mass other than the spotted mass
sce <- filterPixels(sce, minevents = 40, correct_pixels = TRUE)
Compute and store the spillover matrix using the CATALYST package.
sce <- computeSpillmat(sce)
sm <- metadata(sce)$spillover_matrix
The step took 0.03 minutes.
Perform single-cell data compensation using the CATALYST package. The compCytof function corrects channel-to-channel spillover directly on the single-cell intensities using the previously estimated spillover matrix. The isotope_list variable needs to be extended by isotopes that are not contained in this list provided by the CATALYST package. Visualization of marker intensities of neighboring channels (e.g., Yb173 and Yb174) before and after correction can be used to assess the spillover correction efficacy.
library(dittoSeq)
library(patchwork)
# Specify the channel_name entry for use with CATALYST
rowData(spe)$channel_name <- paste0(rowData(spe)$channel, "Di")
# Add the missing metal isotope to the isotope_list
isotope_list <- CATALYST::isotope_list
isotope_list$Ar <- 80
spe <- compCytof(spe, sm,
transform = TRUE, cofactor = 1,
isotope_list = isotope_list,
overwrite = FALSE)
# Visualize biaxial plots before/after compensation
before <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
assay.x = "exprs", assay.y = "exprs") +
ggtitle("Before compensation")
after <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
assay.x = "compexprs", assay.y = "compexprs") +
ggtitle("After compensation")
before + after
# Replace uncompensated assays
assay(spe, "counts") <- assay(spe, "compcounts")
assay(spe, "exprs") <- assay(spe, "compexprs")
assay(spe, "compcounts") <- assay(spe, "compexprs") <- NULL
The step took 0.13 minutes.
Perform channel-to-channel spillover correction on multi-channel images. To this end, the previously computed spillover matrix needs to be adjusted to only retain channels that are stored in the multi-channel images. By visualizing neighboring channels, spillover correction efficacy can be assessed.
# Use mass tags as channel names
channelNames(images) <- rowData(spe)$channel_name
# Adapt spillover matrix to retain channels contained in the multi-channel images
adapted_sm <- adaptSpillmat(sm, channelNames(images),
isotope_list = isotope_list)
## Compensation is likely to be inaccurate.
## Spill values for the following interactions
## have not been estimated:
## Ir191Di -> Ir193Di
## Ir193Di -> Ir191Di
# Perform image compensation
images_comp <- compImage(images, adapted_sm)
# Visualize neighboring channels before/ after compensation
plotPixels(images[5], colour_by = "Yb173Di",
image_title = list(text = "Yb173 (Ecad) - before",
position = "topleft"),
legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))
plotPixels(images[5], colour_by = "Yb174Di",
image_title = list(text = "Yb174 (CD303) - before",
position = "topleft"),
legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))
plotPixels(images_comp[5], colour_by = "Yb173Di",
image_title = list(text = "Yb173 (Ecad) - after",
position = "topleft"),
legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))
plotPixels(images_comp[5], colour_by = "Yb174Di",
image_title = list(text = "Yb174 (CD303) - after",
position = "topleft"),
legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))
# Switch back to using target names as channel names
channelNames(images_comp) <- rownames(spe)
The step took 8.92 minutes.
Outline cells on composite images for visual assessment of segmentation quality. For visualization purposes, we subset 3 images and outline all cells on composite images after channel normalization.
set.seed(20220118)
img_ids <- sample(seq_len(length(images_comp)), 3)
cur_images <- images_comp[img_ids]
# Normalize each channel between 0 and 1
cur_images <- normalize(cur_images, separateImages = TRUE)
# Clip channel intensities at 0 and 0.2
cur_images <- normalize(cur_images, inputRange = c(0, 0.2))
plotPixels(cur_images,
mask = masks[img_ids],
img_id = "sample_id",
missing_colour = "white",
colour_by = c("CD163", "CD20", "CD3", "Ecad", "DNA1"),
colour = list(CD163 = c("black", "yellow"),
CD20 = c("black", "red"),
CD3 = c("black", "green"),
Ecad = c("black", "cyan"),
DNA1 = c("black", "blue")),
image_title = NULL,
legend = list(colour_by.title.cex = 0.9,
colour_by.labels.cex = 0.9))
Visualize the distribution of the cell area and filter out small cells.
dittoPlot(spe, var = "area",
group.by = "sample_id",
plots = "boxplot") +
ylab("Cell area") + xlab("")
spe <- spe[,spe$area >= 5]
Visualize the cell density per image.
# Compute the cell density per image
cell_density <- colData(spe) %>%
as.data.frame() %>%
group_by(sample_id) %>%
# Compute the number of pixels covered by cells and the total number of pixels
summarize(cell_area = sum(area),
no_pixels = mean(width_px) * mean(height_px)) %>%
# Divide the total number of pixels by the number of pixels covered by cells
mutate(covered_area = cell_area / no_pixels)
# Visualize the cell density per image
ggplot(cell_density) +
geom_point(aes(sample_id, covered_area)) +
theme_minimal(base_size = 15) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 15)) +
ylim(c(0, 1)) +
ylab("% covered area") + xlab("")
Visualize staining differences between samples for selected markers. Together with the visualization of cells in low dimensions, this figure indicates sample-to-sample differences in marker expression.
multi_dittoPlot(spe, vars = c("HLADR", "CD3", "Ecad", "PDGFRb"),
group.by = "patient_id", plots = c("ridgeplot"),
assay = "exprs")
## Picking joint bandwidth of 0.162
## Picking joint bandwidth of 0.0982
## Picking joint bandwidth of 0.151
## Picking joint bandwidth of 0.089
Visualize low-dimensional embeddings of single cells. Here, we use the scater package to compute a Uniform Manifold Approximation and Projection (UMAP) embedding and visualize cells in low-dimensional space.
library(scater)
## Loading required package: scuttle
set.seed(220225)
spe <- runUMAP(spe, subset_row = rowData(spe)$use_channel,
exprs_values = "exprs")
# Visualize the UMAP embedding and color cells by patient ID
dittoDimPlot(spe, var = "patient_id",
reduction.use = "UMAP", size = 0.2) +
ggtitle("Patient ID on UMAP")
The step took 0.57 minutes.
Perform batch correction to remove sample-to-sample differences. Here, we use the fastMNN method of the batchelor package.
library(batchelor)
set.seed(220228)
# Perform batch correction
out <- fastMNN(spe, batch = spe$patient_id,
auto.merge = TRUE,
subset.row = rowData(spe)$use_channel,
assay.type = "exprs")
# Store corrected embeddings in SPE object
reducedDim(spe, "fastMNN") <- reducedDim(out, "corrected")
# Compute UMAP on corrected embeddings
spe <- runUMAP(spe, dimred= "fastMNN", name = "UMAP_mnnCorrected")
# Visualize corrected UMAP
dittoDimPlot(spe, var = "patient_id",
reduction.use = "UMAP_mnnCorrected", size = 0.2) +
ggtitle("Patient ID on UMAP after correction")
The step took 2.41 minutes.
Define cellular phenotypes. For this, single cells can be clustered (A) or cells can be labelled via classification (B).
Graph-based clustering is performed using functions from the bluster and scran R/Bioconductor packages. Alternatively, other approaches such as phenograph or FlowSOM can be used to cluster single cells.
Estimate optimal clustering parameters for graph-based clustering on the integrated cells after batch correction. We perform a sweep across possible combinations of clustering parameters, including the number of nearest neighbors to consider (k) and the edge weighting method (type). We keep the community detection algorithm (cluster.fun) fixed, as the Louvain method is one of the most commonly used algorithms for graph-based clustering. To assess cluster stability, we compute the mean silhouette width across all cells and select the cluster parameter combination with highest mean silhouette width.
library(bluster)
library(BiocParallel)
# Select the corrected cell embeddings for clustering
mat <- reducedDim(spe, "fastMNN")
# Perform the cluster sweep
combinations <- clusterSweep(mat, BLUSPARAM=SNNGraphParam(),
k=c(10L, 20L),
type = c("rank", "jaccard"),
cluster.fun = "louvain",
BPPARAM = SerialParam(RNGseed = 230214))
# Compute the average silhouette width per parameter combination
sil <- vapply(as.list(combinations$clusters),
function(x) mean(approxSilhouette(mat, x)$width), 0)
# Visualize the average silhouette width per parameter combination
ggplot(data.frame(method = names(sil),
sil = sil)) +
geom_point(aes(method, sil), size = 3) +
theme_classic(base_size = 15) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Cluster parameter combination") +
ylab("Average silhouette width")
The step took 4.88 minutes.
Based on the selected parameters, cells are clustered using a graph-based algorithm. As observed above, the parameter setting for k=20 and type=”rank” should result in well-separated clusters. The cluster identifiers are then saved in the SpatialExperiment object.
library(scran)
clusters <- clusterCells(spe[rowData(spe)$use_channel,],
use.dimred = "fastMNN",
BLUSPARAM = SNNGraphParam(k=20,
cluster.fun = "louvain",
type = "rank"))
spe$nn_clusters <- clusters
The step took 1.24 minutes.
Visualize the marker expression per cluster
library(viridis)
set.seed(220619)
cur_cells <- sample(seq_len(ncol(spe)), 2000)
dittoHeatmap(spe[,cur_cells],
genes = rownames(spe)[rowData(spe)$use_channel],
assay = "exprs", scale = "none",
heatmap.colors = viridis(100),
annot.by = c("nn_clusters", "patient_id"))
Label clusters
cluster_celltype <- recode(spe$nn_clusters,
"1" = "Tumor_proliferating",
"2" = "Myeloid",
"3" = "Tumor",
"4" = "Tumor",
"5" = "Stroma",
"6" = "Proliferating",
"7" = "Myeloid",
"8" = "CD8",
"9" = "Plasma_cell",
"10" = "CD4",
"11" = "Bcell",
"12" = "Stroma",
"13" = "pDC")
spe$cluster_celltype <- cluster_celltype
Gate individual cell types based on their marker expression. For this, the cytomapper package provides the cytomapperShiny function. Per image, cells are gated based on their marker expression in a hierarchical fashion to define expected cell types. The gated cells are then visualized as outlines on pseudo-coloured composite images. Once the correct cells are labeled, they can be downloaded as a SpatialExperiment object storing only the selected cells. During download, the cell label can be specified, which is stored in the cytomapper_CellLabel entry of the colData slot for later use in training a classifier.
if (interactive()){
cytomapperShiny(object = spe, mask = masks, image = images_comp,
cell_id = "ObjectNumber", img_id = "sample_id")
}
Read in files containing the gated cells and concatenate them into a single SpatialExperiment object.
library(SingleCellExperiment)
label_files <- list.files("data/gated_cells",
full.names = TRUE, pattern = ".rds$")
# Read in individual files containing gated cells
spes <- lapply(label_files, readRDS)
# Concatenate the individual objects into one
concat_spe <- do.call("cbind", spes)
Remove doublets and reassign tumor cells. As a result of the multi-step labeling approach, some cells may have been labeled multiple times. In cases where one cell was labeled as both tumor and immune cell, we keep the immune cell label, as these cells are most likely immune cells residing within the tumor. All other cells that were labeled multiple times are removed. Finally, the labels are stored in the main SpatialExperiment object.
Define filtering function:
filter_labels <- function(object,
label = "cytomapper_CellLabel") {
cur_tab <- unclass(table(colnames(object), object[[label]]))
cur_labels <- colnames(cur_tab)[apply(cur_tab, 1, which.max)]
names(cur_labels) <- rownames(cur_tab)
cur_labels <- cur_labels[rowSums(cur_tab) == 1]
return(cur_labels)
}
labels <- filter_labels(concat_spe)
cur_spe <- concat_spe[,concat_spe$cytomapper_CellLabel != "Tumor"]
non_tumor_labels <- filter_labels(cur_spe)
additional_cells <- setdiff(names(non_tumor_labels), names(labels))
final_labels <- c(labels, non_tumor_labels[additional_cells])
# Transfer labels to SPE object
spe_labels <- rep("unlabeled", ncol(spe))
names(spe_labels) <- colnames(spe)
spe_labels[names(final_labels)] <- final_labels
spe$cell_labels <- spe_labels
Train a random forest classifier for cell type classification of unlabelled cells. The cells are first split into labelled and unlabelled cells. We then perform a 75/25 split of the labelled cells to select training and testing datasets, respectively. Based on the training dataset, we perform a 5-fold cross validation to tune the random forest model parameter.
library(caret)
# Split between labeled and unlabeled cells
lab_spe <- spe[,spe$cell_labels != "unlabeled"]
unlab_spe <- spe[,spe$cell_labels == "unlabeled"]
# Randomly split into train and test data
set.seed(221029)
trainIndex <- createDataPartition(factor(lab_spe$cell_labels), p = 0.75)
train_spe <- lab_spe[,trainIndex$Resample1]
test_spe <- lab_spe[,-trainIndex$Resample1]
# Define fit parameters for 5-fold cross validation
fitControl <- trainControl(method = "cv",
number = 5)
# Select the asinh-transformed counts for training
cur_mat <- t(assay(train_spe, "exprs")[rowData(train_spe)$use_channel,])
# Train a random forest classifier
rffit <- train(x = cur_mat,
y = factor(train_spe$cell_labels),
method = "rf", ntree = 1000,
tuneLength = 5,
trControl = fitControl)
The step took 9.65 minutes.
Assess the classifier performance by computing the confusion matrix of the test dataset. The confusionMatrix function compares the predicted cell labels to the ground truth cell labels and computes a number of performance metrics. A high sensitivity and a high specificity for each cell type label is to be desired.
# Select the arsinh-transformed counts of the test data
cur_mat <- t(assay(test_spe, "exprs")[rowData(test_spe)$use_channel,])
# Predict the cell phenotype labels of the test data
cur_pred <- predict(rffit,
newdata = cur_mat)
# Compute the confusion matrix between predicted and actual labels
confusionMatrix(data = cur_pred,
reference = factor(test_spe$cell_labels),
mode = "everything")
## Confusion Matrix and Statistics
##
## Reference
## Prediction Bcell BnTcell CD4 CD8 Myeloid Neutrophil Plasma_cell Stroma
## Bcell 186 2 0 0 0 0 6 0
## BnTcell 4 423 1 0 0 0 0 0
## CD4 0 0 163 0 0 2 3 2
## CD8 0 0 0 199 0 0 8 0
## Myeloid 0 0 2 1 437 0 0 0
## Neutrophil 0 0 0 0 0 30 0 0
## Plasma_cell 1 0 3 2 0 0 158 0
## Stroma 0 0 2 0 0 0 0 108
## Treg 0 0 0 0 0 0 3 0
## Tumor 4 0 1 3 0 1 1 0
## Reference
## Prediction Treg Tumor
## Bcell 0 1
## BnTcell 0 1
## CD4 0 5
## CD8 0 3
## Myeloid 0 0
## Neutrophil 0 0
## Plasma_cell 1 0
## Stroma 0 0
## Treg 89 2
## Tumor 0 1487
##
## Overall Statistics
##
## Accuracy : 0.9806
## 95% CI : (0.9753, 0.985)
## No Information Rate : 0.4481
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9741
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Bcell Class: BnTcell Class: CD4 Class: CD8
## Sensitivity 0.95385 0.9953 0.94767 0.97073
## Specificity 0.99714 0.9979 0.99622 0.99650
## Pos Pred Value 0.95385 0.9860 0.93143 0.94762
## Neg Pred Value 0.99714 0.9993 0.99716 0.99809
## Precision 0.95385 0.9860 0.93143 0.94762
## Recall 0.95385 0.9953 0.94767 0.97073
## F1 0.95385 0.9906 0.93948 0.95904
## Prevalence 0.05830 0.1271 0.05142 0.06129
## Detection Rate 0.05561 0.1265 0.04873 0.05949
## Detection Prevalence 0.05830 0.1283 0.05232 0.06278
## Balanced Accuracy 0.97549 0.9966 0.97195 0.98361
## Class: Myeloid Class: Neutrophil Class: Plasma_cell
## Sensitivity 1.0000 0.909091 0.88268
## Specificity 0.9990 1.000000 0.99779
## Pos Pred Value 0.9932 1.000000 0.95758
## Neg Pred Value 1.0000 0.999095 0.99340
## Precision 0.9932 1.000000 0.95758
## Recall 1.0000 0.909091 0.88268
## F1 0.9966 0.952381 0.91860
## Prevalence 0.1306 0.009865 0.05351
## Detection Rate 0.1306 0.008969 0.04723
## Detection Prevalence 0.1315 0.008969 0.04933
## Balanced Accuracy 0.9995 0.954545 0.94024
## Class: Stroma Class: Treg Class: Tumor
## Sensitivity 0.98182 0.98889 0.9920
## Specificity 0.99938 0.99846 0.9946
## Pos Pred Value 0.98182 0.94681 0.9933
## Neg Pred Value 0.99938 0.99969 0.9935
## Precision 0.98182 0.94681 0.9933
## Recall 0.98182 0.98889 0.9920
## F1 0.98182 0.96739 0.9927
## Prevalence 0.03288 0.02691 0.4481
## Detection Rate 0.03229 0.02661 0.4445
## Detection Prevalence 0.03288 0.02810 0.4475
## Balanced Accuracy 0.99060 0.99368 0.9933
Predict cell labels of unlabelled cells. Cells for which the highest class probability is below 40% are labeled as “unknown”.
# Select the arsinh-transformed counts of the unlabeled data for prediction
cur_mat <- t(assay(unlab_spe, "exprs")[rowData(unlab_spe)$use_channel,])
# Predict the cell phenotype labels of the unlabeled data
cell_class <- as.character(predict(rffit,
newdata = cur_mat,
type = "raw"))
names(cell_class) <- rownames(cur_mat)
# Extract prediction probabilities for each cell
cell_prob <- predict(rffit,
newdata = cur_mat,
type = "prob")
# Label cells with a low prediction probability as undefined
cell_class[rowMax(as.matrix(cell_prob)) < 0.4] <- "undefined"
# Store cell phenotype labels in SPE object
cell_labels <- spe$cell_labels
cell_labels[colnames(unlab_spe)] <- cell_class
spe$celltype <- cell_labels
Visualize cell type and cluster labels on the UMAP embedding for qualitatively assessing cell phenotyping.
p1 <- dittoDimPlot(spe, var = "celltype",
reduction.use = "UMAP_mnnCorrected", size = 0.2,
do.label = TRUE) +
ggtitle("Cell types on UMAP, integrated cells")
p2 <- dittoDimPlot(spe, var = "cluster_celltype",
reduction.use = "UMAP_mnnCorrected", size = 0.2,
do.label = TRUE) +
ggtitle("Cluster cell types on UMAP, integrated cells")
p1 + p2
Visualize mean marker expression per cell type and per annotated cluster as heatmaps.
library(scuttle)
# Calculate the mean of the arsinh-transformed counts per cell phenotype
celltype_mean <- aggregateAcrossCells(as(spe, "SingleCellExperiment"),
ids = spe$celltype,
statistics = "mean",
use.assay.type = "exprs",
subset_row = rowData(spe)$use_channel)
dittoHeatmap(celltype_mean,
assay = "exprs", cluster_cols = TRUE,
scale = "none",
heatmap.colors = viridis(100),
annot.by = c("celltype","ncells"))
# Calculate the mean of the arsinh-transformed counts per annotated cluster
cluster_mean <- aggregateAcrossCells(as(spe, "SingleCellExperiment"),
ids = spe$cluster_celltype,
statistics = "mean",
use.assay.type = "exprs",
subset_row = rowData(spe)$use_channel)
dittoHeatmap(cluster_mean,
assay = "exprs", cluster_cols = TRUE,
scale = "none",
heatmap.colors = viridis(100),
annot.by = c("cluster_celltype","ncells"))
Perform spatial community analysis. This method groups cells solely based on their location in the tissue by using a previously constructed spatial cell graph. We perform community detection separately for tumor and non-tumor cells.
# Define if cells are part of the tumor or stroma
spe$tumor_stroma <- ifelse(spe$celltype == "Tumor", "Tumor", "Stroma")
# Detect spatial communities
spe <- detectCommunity(spe,
colPairName = "neighborhood",
size_threshold = 10,
group_by = "tumor_stroma",
BPPARAM = SerialParam(RNGseed = 220819))
# Visualize spatial tumor communities
plotSpatial(spe[,spe$celltype == "Tumor"],
node_color_by = "spatial_community",
img_id = "sample_id",
node_size_fix = 0.5) +
theme(legend.position = "none") +
scale_color_manual(values = rev(colors()))
The step took 0.07 minutes.
Perform cellular neighborhood (CN) analysis. CNs are tissue regions with characteristic cell type composition and represent sites of unique local biological processes and interactions. We first construct a spatial 20-nearest neighbor graph based on the cells’ centroids, to include a larger number of neighbors compared to the spatial cell graph constructed earlier using steinbock. The aggregateNeighbors function of the imcRtools package then computes for each cell the cell type fraction among its 20-nearest neighbors. Cells are subsequently clustered using k-means clustering to detect CNs. Finally, these can be spatially visualized and the cell type fraction per CN can be computed.
# Construct a 20-nearest neighbor graph
spe <- buildSpatialGraph(spe, img_id = "sample_id", type = "knn", k = 20)
# Compute the fraction of cell phenotypes in each cells neighborhood
spe <- aggregateNeighbors(spe, colPairName = "knn_interaction_graph",
aggregate_by = "metadata", count_by = "celltype")
set.seed(220705)
# Cluster cells based on cell phenotype fractions
cn_1 <- kmeans(spe$aggregatedNeighbors, centers = 6)
spe$cn_celltypes <- as.factor(cn_1$cluster)
# Visualize cellular neighborhoods
plotSpatial(spe,
node_color_by = "cn_celltypes",
img_id = "sample_id",
node_size_fix = 0.5) +
scale_color_brewer(palette = "Set3")
# Compute the fraction of cell phenotypes in each cellular neighborhood
mat <- prop.table(table(spe$cn_celltypes, spe$celltype),
margin = 1)
# Visualize cell phenotype fractions in each cellular neighborhood
pheatmap(mat,
color = colorRampPalette(c("dark blue", "white", "dark red"))(100),
scale = "column")
The step took 0.19 minutes.
Perform spatial context (SC) analysis. SCs build up on the concept of CNs and are regions in which the local biological processes of CNs interact, and where specialized biological events take place. We construct a second k-nearest neighbor graph with larger k (k=40) to include cells across a length scale on which biological signals could be exchanged. For each cell, the aggregateNeighbours function computes the fraction of CNs among its 40 nearest neighbors. The detectSpatialContext function sorts the CN fractions from high to low, and the SC of each cell is assigned as the minimal combination of CNs that additively surpass a user-defined threshold (here threshold=0.9). After filtering the detected SCs, we can spatially visualize them and represent SC interactions as a graph.
# Construct a 40-nearest neighbor graph
spe <- buildSpatialGraph(spe, img_id = "sample_id",
type = "knn",
name = "knn_spatialcontext_graph",
k = 40)
# Compute the fraction of cellular neighborhoods around each cell
spe <- aggregateNeighbors(spe,
colPairName = "knn_spatialcontext_graph",
aggregate_by = "metadata",
count_by = "cn_celltypes",
name = "aggregatedNeighborhood")
# Detect spatial contexts
spe <- detectSpatialContext(spe,
entry = "aggregatedNeighborhood",
threshold = 0.90,
name = "spatial_context")
# Filter out small and infrequent spatial contexts
spe <- filterSpatialContext(spe,
entry = "spatial_context",
group_by = "patient_id",
group_threshold = 3,
cells_threshold = 100,
name = "spatial_context_filtered")
# Visualize cell location colored by their spatial context
plotSpatial(spe,
node_color_by = "spatial_context_filtered",
img_id = "sample_id",
node_size_fix = 0.5,
colPairName = "knn_spatialcontext_graph")
# Visualize spatial contexts as graph
plotSpatialContext(spe,
entry = "spatial_context_filtered",
group_by = "sample_id",
node_color_by = "n_cells",
node_size_by = "n_group",
node_label_color_by = "n_cells") +
scale_color_viridis()
The step took 0.34 minutes.
Perform patch detection analysis. The patchDetection function of the imcRtools package detects fully connected components of cells of interest, constructs a convex hull around each component, and expands this hull to include neighboring cells. Below, we detect connected tumor components made up of at least 10 cells, and we slightly expand the convex hull to include cells within the patch.
spe <- patchDetection(spe,
patch_cells = spe$celltype == "Tumor",
img_id = "sample_id",
expand_by = 1,
min_patch_size = 10,
colPairName = "neighborhood")
plotSpatial(spe,
node_color_by = "patch_id",
img_id = "sample_id",
node_size_fix = 0.5) +
theme(legend.position = "none") +
scale_color_manual(values = rev(colors()))
The step took 0.41 minutes.
Perform interaction analysis. This approach detects cell type pairs that show stronger (“interaction”) or weaker (“avoidance”) co-localization compared to a random distribution of cell types. Using a previously constructed spatial cell graph (here the one produced by steinbock), the testInteraction function of the imcRtools package computes the average interaction count for each cell type pair per image and compares it against an empirical null distribution derived by permuting all cell labels. The returned data frame contains one entry per cell type pair for each image indicating the empirical p-value and statistical significance (interaction: 1, no significance: 0, avoidance: -1). These significance values can be summed across all images and visualized in the form of a heatmap.
library(scales)
set.seed(220825)
out <- testInteractions(spe,
group_by = "sample_id",
label = "celltype",
colPairName = "neighborhood",
BPPARAM = SerialParam(RNGseed = 221029))
# Sum interaction values across all images
summed_sigvals <- out %>% as_tibble() %>%
group_by(from_label, to_label) %>%
summarize(sum_sigval = sum(sigval, na.rm = TRUE))
# Visualize summed interaction values in form of a heatmap
ggplot(summed_sigvals) +
geom_tile(aes(from_label, to_label, fill = sum_sigval)) +
scale_fill_gradient2(low = muted("blue"),
mid = "white",
high = muted("red")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The step took 6.85 minutes.
We finally save out the SpatialExperiment object.
saveRDS(spe, "data/spe.rds")
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scales_1.2.1 caret_6.0-93
## [3] lattice_0.20-45 viridis_0.6.2
## [5] viridisLite_0.4.1 scran_1.26.2
## [7] BiocParallel_1.32.5 bluster_1.8.0
## [9] batchelor_1.14.1 scater_1.26.1
## [11] scuttle_1.8.4 patchwork_1.1.2
## [13] dittoSeq_1.10.0 pheatmap_1.0.12
## [15] CATALYST_1.22.0 cytomapper_1.10.1
## [17] EBImage_4.40.0 forcats_1.0.0
## [19] stringr_1.5.0 dplyr_1.1.0
## [21] purrr_1.0.1 readr_2.1.3
## [23] tidyr_1.3.0 tibble_3.1.8
## [25] ggplot2_3.4.0 tidyverse_1.3.2
## [27] imcRtools_1.4.2 SpatialExperiment_1.8.0
## [29] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
## [31] Biobase_2.58.0 GenomicRanges_1.50.2
## [33] GenomeInfoDb_1.34.9 IRanges_2.32.0
## [35] S4Vectors_0.36.1 BiocGenerics_0.44.0
## [37] MatrixGenerics_1.10.0 matrixStats_0.63.0
## [39] BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] rsvd_1.0.5 svglite_2.1.1
## [3] class_7.3-20 fftwtools_0.9-11
## [5] V8_4.2.2 foreach_1.5.2
## [7] crayon_1.5.2 MASS_7.3-58.1
## [9] rhdf5filters_1.10.0 nlme_3.1-160
## [11] backports_1.4.1 reprex_2.0.2
## [13] rlang_1.0.6 XVector_0.38.0
## [15] readxl_1.4.2 irlba_2.3.5.1
## [17] limma_3.54.1 rjson_0.2.21
## [19] bit64_4.0.5 glue_1.6.2
## [21] parallel_4.2.2 vipor_0.4.5
## [23] classInt_0.4-8 shinydashboard_0.7.2
## [25] haven_2.5.1 tidyselect_1.2.0
## [27] distances_0.1.9 XML_3.99-0.13
## [29] zoo_1.8-11 sf_1.0-9
## [31] ggpubr_0.5.0 nnls_1.4
## [33] xtable_1.8-4 magrittr_2.0.3
## [35] evaluate_0.20 cli_3.6.0
## [37] zlibbioc_1.44.0 rstudioapi_0.14
## [39] sp_1.6-0 bslib_0.4.2
## [41] rpart_4.1.19 shiny_1.7.4
## [43] BiocSingular_1.14.0 xfun_0.37
## [45] clue_0.3-64 cluster_2.1.4
## [47] tidygraph_1.2.3 ggrepel_0.9.3
## [49] listenv_0.9.0 png_0.1-8
## [51] future_1.31.0 ipred_0.9-13
## [53] withr_2.5.0 bitops_1.0-7
## [55] ggforce_0.4.1 plyr_1.8.8
## [57] cellranger_1.1.0 RTriangle_1.6-0.12
## [59] hardhat_1.2.0 e1071_1.7-13
## [61] dqrng_0.3.0 pROC_1.18.0
## [63] pillar_1.8.1 GlobalOptions_0.1.2
## [65] cachem_1.0.6 multcomp_1.4-20
## [67] fs_1.6.1 raster_3.6-14
## [69] GetoptLong_1.0.5 DelayedMatrixStats_1.20.0
## [71] vctrs_0.5.2 ellipsis_0.3.2
## [73] generics_0.1.3 lava_1.7.1
## [75] tools_4.2.2 beeswarm_0.4.0
## [77] munsell_0.5.0 tweenr_2.0.2
## [79] proxy_0.4-27 DelayedArray_0.24.0
## [81] fastmap_1.1.0 compiler_4.2.2
## [83] abind_1.4-5 httpuv_1.6.8
## [85] GenomeInfoDbData_1.2.9 prodlim_2019.11.13
## [87] gridExtra_2.3 edgeR_3.40.2
## [89] ggnewscale_0.4.8 utf8_1.2.3
## [91] later_1.3.0 recipes_1.0.4
## [93] jsonlite_1.8.4 concaveman_1.1.0
## [95] ScaledMatrix_1.6.0 carData_3.0-5
## [97] sparseMatrixStats_1.10.0 promises_1.2.0.1
## [99] car_3.1-1 doParallel_1.0.17
## [101] R.utils_2.12.2 rmarkdown_2.20
## [103] sandwich_3.0-2 cowplot_1.1.1
## [105] textshaping_0.3.6 statmod_1.5.0
## [107] Rtsne_0.16 uwot_0.1.14
## [109] igraph_1.3.5 HDF5Array_1.26.0
## [111] survival_3.4-0 ResidualMatrix_1.8.0
## [113] yaml_2.3.7 plotrix_3.8-2
## [115] systemfonts_1.0.4 cytolib_2.10.1
## [117] htmltools_0.5.4 locfit_1.5-9.7
## [119] graphlayouts_0.8.4 digest_0.6.31
## [121] assertthat_0.2.1 mime_0.12
## [123] tiff_0.1-11 units_0.8-1
## [125] future.apply_1.10.0 data.table_1.14.6
## [127] R.oo_1.25.0 flowCore_2.10.0
## [129] drc_3.0-1 ragg_1.2.5
## [131] splines_4.2.2 labeling_0.4.2
## [133] Rhdf5lib_1.20.0 googledrive_2.0.0
## [135] RCurl_1.98-1.10 broom_1.0.3
## [137] hms_1.1.2 modelr_0.1.10
## [139] rhdf5_2.42.0 colorspace_2.1-0
## [141] DropletUtils_1.18.1 ConsensusClusterPlus_1.62.0
## [143] BiocManager_1.30.19 ggbeeswarm_0.7.1
## [145] shape_1.4.6 nnet_7.3-18
## [147] sass_0.4.5 Rcpp_1.0.10
## [149] bookdown_0.32 mvtnorm_1.1-3
## [151] circlize_0.4.15 FlowSOM_2.6.0
## [153] RProtoBufLib_2.10.0 fansi_1.0.4
## [155] tzdb_0.3.0 parallelly_1.34.0
## [157] ModelMetrics_1.2.2.2 R6_2.5.1
## [159] grid_4.2.2 ggridges_0.5.4
## [161] lifecycle_1.0.3 curl_5.0.0
## [163] ggsignif_0.6.4 googlesheets4_1.0.1
## [165] jquerylib_0.1.4 Matrix_1.5-3
## [167] RcppAnnoy_0.0.20 TH.data_1.1-1
## [169] RColorBrewer_1.1-3 iterators_1.0.14
## [171] gower_1.0.1 svgPanZoom_0.3.4
## [173] htmlwidgets_1.6.1 beachmat_2.14.0
## [175] polyclip_1.10-4 timechange_0.2.0
## [177] terra_1.7-3 rvest_1.0.3
## [179] ComplexHeatmap_2.14.0 globals_0.16.2
## [181] codetools_0.2-18 lubridate_1.9.1
## [183] metapod_1.6.0 randomForest_4.7-1.1
## [185] gtools_3.9.4 dbplyr_2.3.0
## [187] R.methodsS3_1.8.2 gtable_0.3.1
## [189] DBI_1.1.3 httr_1.4.4
## [191] highr_0.10 KernSmooth_2.23-20
## [193] stringi_1.7.12 vroom_1.6.1
## [195] reshape2_1.4.4 farver_2.1.1
## [197] magick_2.7.3 timeDate_4022.108
## [199] DT_0.27 xml2_1.3.3
## [201] colorRamps_2.3.1 BiocNeighbors_1.16.0
## [203] bit_4.0.5 jpeg_0.1-10
## [205] ggraph_2.1.0 pkgconfig_2.0.3
## [207] gargle_1.3.0 rstatix_0.7.2
## [209] knitr_1.42